Search Results: "kees"

4 December 2016

Ben Hutchings: Linux Kernel Summit 2016, part 2

I attended this year's Linux Kernel Summit in Santa Fe, NM, USA and made notes on some of the sessions that were relevant to Debian. LWN also reported many of the discussions. This is the second and last part of my notes; part 1 is here. Kernel Hardening Kees Cook presented the ongoing work on upstream kernel hardening, also known as the Kernel Self-Protection Project or KSPP. GCC plugins The kernel build system can now build and use GCC plugins to implement some protections. This requires gcc 4.5 and the plugin headers installed. It has been tested on x86, arm, and arm64. It is disabled by CONFIG_COMPILE_TEST because CI systems using allmodconfig/allyesconfig probably don't have those installed, but this ought to be changed at some point. There was a question as to how plugin headers should be installed for cross-compilers or custom compilers, but I didn't hear a clear answer to this. Kees has been prodding distribution gcc maintainers to package them. Mark Brown mentioned the Linaro toolchain being widely used; Kees has not talked to its maintainers yet. Probabilistic protections These protections are based on hidden state that an attacker will need to discover in order to make an effective attack; they reduce the probability of success but don't prevent it entirely. Kernel address space layout randomisation (KASLR) has now been implemented on x86, arm64, and mips for the kernel image. (Debian enables this.) However there are still lots of information leaks that defeat this. This could theoretically be improved by relocating different sections or smaller parts of the kernel independently, but this requires re-linking at boot. Aside from software information leaks, the branch target predictor on (common implementations of) x86 provides a side channel to find addresses of branches in the kernel. Page and heap allocation, etc., is still quite predictable. struct randomisation (RANDSTRUCT plugin from grsecurity) reorders members in (a) structures containing only function pointers (b) explicitly marked structures. This makes it very hard to attack custom kernels where the kernel image is not readable. But even for distribution kernels, it increases the maintenance burden for attackers. Deterministic protections These protections block a class of attacks completely. Read-only protection of kernel memory is either mandatory or enabled by default on x86, arm, and arm64. (Debian enables this.) Protections against execution of user memory in kernel mode are now implemented in hardware on x86 (SMEP, in Intel processors from Skylake onward) and on arm64 (PXN, from ARMv8.1). But Skylake is not available for servers and ARMv8.1 is not yet implemented at all! s390 always had this protection. It may be possible to 'emulate' this using other hardware protections. arm (v7) and arm64 now have this, but x86 doesn't. Linus doesn't like the overhead of previously proposed implementations for x86. It is possible to do this using PCID (in Intel processors from Sandy Bridge onward), which has already been done in PaX - and this should be fast enough. Virtually mapped stacks protect against stack overflow attacks. They were implemented as an option for x86 only in 4.9. (Debian enables this.) Copies to or from user memory sometimes use a user-controlled size that is not properly bounded. Hardened usercopy, implemented as an option in 4.8 for many architectures, protects against this. (Debian enables this.) Memory wiping (zero on free) protects against some information leaks and use-after-free bugs. It was already implemented as debug feature with non-zero poison value, but at some performance cost. Zeroing can be cheaper since it allows allocator to skip zeroing on reallocation. That was implemented as an option in 4.6. (Debian does not currently enable this but we might do if the performance cost is low enough.) Constification (with the CONSTIFY gcc plugin) reduces the amount of static data that can be written to. As with RANDSTRUCT, this is applied to function pointer tables and explicitly marked structures. Instances of some types need to be modified very occasionally. In PaX/Grsecurity this is done with pax_ open,close _kernel() which globally disable write protection temporarily. It would be preferable to override write protection in a more directed way, so that the permission to write doesn't leak into any other code that interrupts this process. The feature is not in mainline yet. Atomic wrap detction protects against reference-counting bugs which can result in a use-after-free. Overflow and underflow are trapped and result in an 'oops'. There is no measurable performance impact. It would be applied to all operations on the atomic_t type, but there needs to be an opt-out for atomics that are not ref-counters - probably by adding an atomic_wrap_t type for them. This has been implemented for x86, arm, and arm64 but is not in mainline yet. Kernel Freezer Hell For the second year running, Jiri Kosina raised the problem of 'freezing' kthreads (kernel-mode threads) in preparation for system suspend (suspend to RAM, or hibernation). What are the semantics? What invariants should be met when a kthread gets frozen? They are not defined anywhere. Most freezable threads don't actually need to be quiesced. Also many non-freezable threads are pointlessly calling try_to_freeze() (probably due to copying code without understanding it)). At a system level, what we actually need is I/O and filesystem consistency. This should be achieved by: The system suspend code should not need to directly freeze threads. Kernel Documentation Jon Corbet and Mauro Carvalho presented the recent work on kernel documentation. The kernel's documentation system was a house of cards involving DocBook and a lot of custom scripting. Both the DocBook templates and plain text files are gradually being converted to reStructuredText format, processed by Sphinx. However, manual page generation is currently 'broken' for documents processed by Sphinx. There are about 150 files at the top level of the documentation tree, that are being gradually moved into subdirectories. The most popular files, that are likely to be referenced in external documentation, have been replaced by placeholders. Sphinx is highly extensible and this has been used to integrate kernel-doc. It would be possible to add extensions that parse and include the MAINTAINERS file and Documentation/ABI/ files, which have their own formats, but the documentation maintainers would prefer not to add extensions that can't be pushed to Sphinx upstream. There is lots of obsolete documentation, and patches to remove those would be welcome. Linus objected to PDF files recently added under the Documentation/media directory - they are not the source format so should not be there! They should be generated from the corresponding SVG or image files at build time. Issues around Tracepoints Steve Rostedt and Shuah Khan led a discussion about tracepoints. Currently each maintainer decides which tracepoints to create. The cost of each added tracepoint is minimal, but the cost of very many tracepoints is more substantial. So there is such a thing as too many tracepoints, and we need a policy to decide when they are justified. They advised not to create tracepoints just in case, since kprobes can be used for tracing (almost) anywhere dynamically. There was some support for requiring documentation of each new tracepoint. That may dissuade introduction of obscure tracepoints, but also creates a higher expectation of stability. Tools such as bcc and IOVisor are now being created that depend on specific tracepoints or even function names (through kprobes). Should we care about breaking them? Linus said that we should strive to be polite to developers and users relying on tracepoints, but if it's too painful to maintain a tracepoint then we should go ahead and change it. Where the end users of the tool are themselves developers it's more reasonable to expect them to upgrade the tool and we should care less about changing it. In some cases tracepoints could provide dummy data for compatibility (as is done in some places in procfs).

20 October 2016

Kees Cook: CVE-2016-5195

My prior post showed my research from earlier in the year at the 2016 Linux Security Summit on kernel security flaw lifetimes. Now that CVE-2016-5195 is public, here are updated graphs and statistics. Due to their rarity, the Critical bug average has now jumped from 3.3 years to 5.2 years. There aren t many, but, as I mentioned, they still exist, whether you know about them or not. CVE-2016-5195 was sitting on everyone s machine when I gave my LSS talk, and there are still other flaws on all our Linux machines right now. (And, I should note, this problem is not unique to Linux.) Dealing with knowing that there are always going to be bugs present requires proactive kernel self-protection (to minimize the effects of possible flaws) and vendors dedicated to updating their devices regularly and quickly (to keep the exposure window minimized once a flaw is widely known). So, here are the graphs updated for the 668 CVEs known today:

2016, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License

19 October 2016

Kees Cook: Security bug lifetime

In several of my recent presentations, I ve discussed the lifetime of security flaws in the Linux kernel. Jon Corbet did an analysis in 2010, and found that security bugs appeared to have roughly a 5 year lifetime. As in, the flaw gets introduced in a Linux release, and then goes unnoticed by upstream developers until another release 5 years later, on average. I updated this research for 2011 through 2016, and used the Ubuntu Security Team s CVE Tracker to assist in the process. The Ubuntu kernel team already does the hard work of trying to identify when flaws were introduced in the kernel, so I didn t have to re-do this for the 557 kernel CVEs since 2011. As the README details, the raw CVE data is spread across the active/, retired/, and ignored/ directories. By scanning through the CVE files to find any that contain the line Patches_linux: , I can extract the details on when a flaw was introduced and when it was fixed. For example CVE-2016-0728 shows:
Patches_linux:
 break-fix: 3a50597de8635cd05133bd12c95681c82fe7b878 23567fd052a9abb6d67fe8e7a9ccdd9800a540f2
This means that CVE-2016-0728 is believed to have been introduced by commit 3a50597de8635cd05133bd12c95681c82fe7b878 and fixed by commit 23567fd052a9abb6d67fe8e7a9ccdd9800a540f2. If there are multiple lines, then there may be multiple SHAs identified as contributing to the flaw or the fix. And a - is just short-hand for the start of Linux git history. Then for each SHA, I queried git to find its corresponding release, and made a mapping of release version to release date, wrote out the raw data, and rendered graphs. Each vertical line shows a given CVE from when it was introduced to when it was fixed. Red is Critical , orange is High , blue is Medium , and black is Low : CVE lifetimes 2011-2016 And here it is zoomed in to just Critical and High: Critical and High CVE lifetimes 2011-2016 The line in the middle is the date from which I started the CVE search (2011). The vertical axis is actually linear time, but it s labeled with kernel releases (which are pretty regular). The numerical summary is: This comes out to roughly 5 years lifetime again, so not much has changed from Jon s 2010 analysis. While we re getting better at fixing bugs, we re also adding more bugs. And for many devices that have been built on a given kernel version, there haven t been frequent (or some times any) security updates, so the bug lifetime for those devices is even longer. To really create a safe kernel, we need to get proactive about self-protection technologies. The systems using a Linux kernel are right now running with security flaws. Those flaws are just not known to the developers yet, but they re likely known to attackers, as there have been prior boasts/gray-market advertisements for at least CVE-2010-3081 and CVE-2013-2888. (Edit: see my updated graphs that include CVE-2016-5195.)

2016, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License

5 October 2016

Kees Cook: security things in Linux v4.8

Previously: v4.7. Here are a bunch of security things I m excited about in Linux v4.8: SLUB freelist ASLR Thomas Garnier continued his freelist randomization work by adding SLUB support. x86_64 KASLR text base offset physical/virtual decoupling On x86_64, to implement the KASLR text base offset, the physical memory location of the kernel was randomized, which resulted in the virtual address being offset as well. Due to how the kernel s -2GB addressing works (gcc s -mcmodel=kernel ), it wasn t possible to randomize the physical location beyond the 2GB limit, leaving any additional physical memory unused as a randomization target. In order to decouple the physical and virtual location of the kernel (to make physical address exposures less valuable to attackers), the physical location of the kernel needed to be randomized separately from the virtual location. This required a lot of work for handling very large addresses spanning terabytes of address space. Yinghai Lu, Baoquan He, and I landed a series of patches that ultimately did this (and in the process fixed some other bugs too). This expands the physical offset entropy to roughly $physical_memory_size_of_system / 2MB bits. x86_64 KASLR memory base offset Thomas Garnier rolled out KASLR to the kernel s various statically located memory ranges, randomizing their locations with CONFIG_RANDOMIZE_MEMORY. One of the more notable things randomized is the physical memory mapping, which is a known target for attacks. Also randomized is the vmalloc area, which makes attacks against targets vmalloced during boot (which tend to always end up in the same location on a given system) are now harder to locate. (The vmemmap region randomization accidentally missed the v4.8 window and will appear in v4.9.) x86_64 KASLR with hibernation Rafael Wysocki (with Thomas Garnier, Borislav Petkov, Yinghai Lu, Logan Gunthorpe, and myself) worked on a number of fixes to hibernation code that, even without KASLR, were coincidentally exposed by the earlier W^X fix. With that original problem fixed, then memory KASLR exposed more problems. I m very grateful everyone was able to help out fixing these, especially Rafael and Thomas. It s a hard place to debug. The bottom line, now, is that hibernation and KASLR are no longer mutually exclusive. gcc plugin infrastructure Emese Revfy ported the PaX/Grsecurity gcc plugin infrastructure to upstream. If you want to perform compiler-based magic on kernel builds, now it s much easier with CONFIG_GCC_PLUGINS! The plugins live in scripts/gcc-plugins/. Current plugins are a short example called Cyclic Complexity which just emits the complexity of functions as they re compiled, and Sanitizer Coverage which provides the same functionality as gcc s recent -fsanitize-coverage=trace-pc but back through gcc 4.5. Another notable detail about this work is that it was the first Linux kernel security work funded by Linux Foundation s Core Infrastructure Initiative. I m looking forward to more plugins! If you re on Debian or Ubuntu, the required gcc plugin headers are available via the gcc-$N-plugin-dev package (and similarly for all cross-compiler packages). hardened usercopy Along with work from Rik van Riel, Laura Abbott, Casey Schaufler, and many other folks doing testing on the KSPP mailing list, I ported part of PAX_USERCOPY (the basic runtime bounds checking) to upstream as CONFIG_HARDENED_USERCOPY. One of the interface boundaries between the kernel and user-space are the copy_to_user()/copy_from_user() family of functions. Frequently, the size of a copy is known at compile-time ( built-in constant ), so there s not much benefit in checking those sizes (hardened usercopy avoids these cases). In the case of dynamic sizes, hardened usercopy checks for 3 areas of memory: slab allocations, stack allocations, and kernel text. Direct kernel text copying is simply disallowed. Stack copying is allowed as long as it is entirely contained by the current stack memory range (and on x86, only if it does not include the saved stack frame and instruction pointers). For slab allocations (e.g. those allocated through kmem_cache_alloc() and the kmalloc()-family of functions), the copy size is compared against the size of the object being copied. For example, if copy_from_user() is writing to a structure that was allocated as size 64, but the copy gets tricked into trying to write 65 bytes, hardened usercopy will catch it and kill the process. For testing hardened usercopy, lkdtm gained several new tests: USERCOPY_HEAP_SIZE_TO, USERCOPY_HEAP_SIZE_FROM, USERCOPY_STACK_FRAME_TO,
USERCOPY_STACK_FRAME_FROM, USERCOPY_STACK_BEYOND, and USERCOPY_KERNEL. Additionally, USERCOPY_HEAP_FLAG_TO and USERCOPY_HEAP_FLAG_FROM were added to test what will be coming next for hardened usercopy: flagging slab memory as safe for copy to/from user-space , effectively whitelisting certainly slab caches, as done by PAX_USERCOPY. This further reduces the scope of what s allowed to be copied to/from, since most kernel memory is not intended to ever be exposed to user-space. Adding this logic will require some reorganization of usercopy code to add some new APIs, as PAX_USERCOPY s approach to handling special-cases is to add bounce-copies (copy from slab to stack, then copy to userspace) as needed, which is unlikely to be acceptable upstream. seccomp reordered after ptrace By its original design, seccomp filtering happened before ptrace so that seccomp-based ptracers (i.e. SECCOMP_RET_TRACE) could explicitly bypass seccomp filtering and force a desired syscall. Nothing actually used this feature, and as it turns out, it s not compatible with process launchers that install seccomp filters (e.g. systemd, lxc) since as long as the ptrace and fork syscalls are allowed (and fork is needed for any sensible container environment), a process could spawn a tracer to help bypass a filter by injecting syscalls. After Andy Lutomirski convinced me that ordering ptrace first does not change the attack surface of a running process (unless all syscalls are blacklisted, the entire ptrace attack surface will always be exposed), I rearranged things. Now there is no (expected) way to bypass seccomp filters, and containers with seccomp filters can allow ptrace again. That s it for v4.8! The merge window is open for v4.9

2016, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License

3 October 2016

Kees Cook: security things in Linux v4.7

Previously: v4.6. Onward to security things I found interesting in Linux v4.7: KASLR text base offset for MIPS Matt Redfearn added text base address KASLR to MIPS, similar to what s available on x86 and arm64. As done with x86, MIPS attempts to gather entropy from various build-time, run-time, and CPU locations in an effort to find reasonable sources during early-boot. MIPS doesn t yet have anything as strong as x86 s RDRAND (though most have an instruction counter like x86 s RDTSC), but it does have the benefit of being able to use Device Tree (i.e. the /chosen/kaslr-seed property) like arm64 does. By my understanding, even without Device Tree, MIPS KASLR entropy should be as strong as pre-RDRAND x86 entropy, which is more than sufficient for what is, similar to x86, not a huge KASLR range anyway: default 8 bits (a span of 16MB with 64KB alignment), though CONFIG_RANDOMIZE_BASE_MAX_OFFSET can be tuned to the device s memory, giving a maximum of 11 bits on 32-bit, and 15 bits on EVA or 64-bit. SLAB freelist ASLR Thomas Garnier added CONFIG_SLAB_FREELIST_RANDOM to make slab allocation layouts less deterministic with a per-boot randomized freelist order. This raises the bar for successful kernel slab attacks. Attackers will need to either find additional bugs to help leak slab layout information or will need to perform more complex grooming during an attack. Thomas wrote a post describing the feature in more detail here: Randomizing the Linux kernel heap freelists. (SLAB is done in v4.7, and SLUB in v4.8.) eBPF JIT constant blinding Daniel Borkmann implemented constant blinding in the eBPF JIT subsystem. With strong kernel memory protections (CONFIG_DEBUG_RODATA) in place, and with the segregation of user-space memory execution from kernel (i.e SMEP, PXN, CONFIG_CPU_SW_DOMAIN_PAN), having a place where user-space can inject content into an executable area of kernel memory becomes very high-value to an attacker. The eBPF JIT was exactly such a thing: the use of BPF constants could result in the JIT producing instruction flows that could include attacker-controlled instructions (e.g. by directing execution into the middle of an instruction with a constant that would be interpreted as a native instruction). The eBPF JIT already uses a number of other defensive tricks (e.g. random starting position), but this added randomized blinding to any BPF constants, which makes building a malicious execution path in the eBPF JIT memory much more difficult (and helps block attempts at JIT spraying to bypass other protections). Elena Reshetova updated a 2012 proof-of-concept attack to succeed against modern kernels to help provide a working example of what needed fixing in the JIT. This serves as a thorough regression test for the protection. The cBPF JITs that exist in ARM, MIPS, PowerPC, and Sparc still need to be updated to eBPF, but when they do, they ll gain all these protections immediatley. Bottom line is that if you enable the (disabled-by-default) bpf_jit_enable sysctl, be sure to set the bpf_jit_harden sysctl to 2 (to perform blinding even for root). fix brk ASLR weakness on arm64 compat There have been a few ASLR fixes recently (e.g. ET_DYN, x86 32-bit unlimited stack), and while reviewing some suggested fixes to arm64 brk ASLR code from Jon Medhurst, I noticed that arm64 s brk ASLR entropy was slightly too low (less than 1 bit) for 64-bit and noticeably lower (by 2 bits) for 32-bit compat processes when compared to native 32-bit arm. I simplified the code by using literals for the entropy. Maybe we can add a sysctl some day to control brk ASLR entropy like was done for mmap ASLR entropy. LoadPin LSM LSM stacking is well-defined since v4.2, so I finally upstreamed a small LSM that implements a protection I wrote for Chrome OS several years back. On systems with a static root of trust that extends to the filesystem level (e.g. Chrome OS s coreboot+depthcharge boot firmware chaining to dm-verity, or a system booting from read-only media), it s redundant to sign kernel modules (you ve already got the modules on read-only media: they can t change). The kernel just needs to know they re all coming from the correct location. (And this solves loading known-good firmware too, since there is no convention for signed firmware in the kernel yet.) LoadPin requires that all modules, firmware, etc come from the same mount (and assumes that the first loaded file defines which mount is correct , hence load pinning ). That s it for v4.7. Prepare yourself for v4.8 next!

2016, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License

1 October 2016

Kees Cook: security things in Linux v4.6

Previously: v4.5. The v4.6 Linux kernel release included a bunch of stuff, with much more of it under the KSPP umbrella. seccomp support for parisc Helge Deller added seccomp support for parisc, which including plumbing support for PTRACE_GETREGSET to get the self-tests working. x86 32-bit mmap ASLR vs unlimited stack fixed Hector Marco-Gisbert removed a long-standing limitation to mmap ASLR on 32-bit x86, where setting an unlimited stack (e.g. ulimit -s unlimited ) would turn off mmap ASLR (which provided a way to bypass ASLR when executing setuid processes). Given that ASLR entropy can now be controlled directly (see the v4.5 post), and that the cases where this created an actual problem are very rare, means that if a system sees collisions between unlimited stack and mmap ASLR, they can just adjust the 32-bit ASLR entropy instead. x86 execute-only memory Dave Hansen added Protection Key support for future x86 CPUs and, as part of this, implemented support for execute only memory in user-space. On pkeys-supporting CPUs, using mmap(..., PROT_EXEC) (i.e. without PROT_READ) will mean that the memory can be executed but cannot be read (or written). This provides some mitigation against automated ROP gadget finding where an executable is read out of memory to find places that can be used to build a malicious execution path. Using this will require changing some linker behavior (to avoid putting data in executable areas), but seems to otherwise Just Work. I m looking forward to either emulated QEmu support or access to one of these fancy CPUs. CONFIG_DEBUG_RODATA enabled by default on arm and arm64, and mandatory on x86 Ard Biesheuvel (arm64) and I (arm) made the poorly-named CONFIG_DEBUG_RODATA enabled by default. This feature controls whether the kernel enforces proper memory protections on its own memory regions (code memory is executable and read-only, read-only data is actually read-only and non-executable, and writable data is non-executable). This protection is a fundamental security primitive for kernel self-protection, so making it on-by-default is required to start any kind of attack surface reduction within the kernel. On x86 CONFIG_DEBUG_RODATA was already enabled by default, but, at Ingo Molnar s suggestion, I made it mandatory: CONFIG_DEBUG_RODATA cannot be turned off on x86. I expect we ll get there with arm and arm64 too, but the protection is still somewhat new on these architectures, so it s reasonable to continue to leave an out for developers that find themselves tripping over it. arm64 KASLR text base offset Ard Biesheuvel reworked a ton of arm64 infrastructure to support kernel relocation and, building on that, Kernel Address Space Layout Randomization of the kernel text base offset (and module base offset). As with x86 text base KASLR, this is a probabilistic defense that raises the bar for kernel attacks where finding the KASLR offset must be added to the chain of exploits used for a successful attack. One big difference from x86 is that the entropy for the KASLR must come either from Device Tree (in the /chosen/kaslr-seed property) or from UEFI (via EFI_RNG_PROTOCOL), so if you re building arm64 devices, make sure you have a strong source of early-boot entropy that you can expose through your boot-firmware or boot-loader. zero-poison after free Laura Abbott reworked a bunch of the kernel memory management debugging code to add zeroing of freed memory, similar to PaX/Grsecurity s PAX_MEMORY_SANITIZE feature. This feature means that memory is cleared at free, wiping any sensitive data so it doesn t have an opportunity to leak in various ways (e.g. accidentally uninitialized structures or padding), and that certain types of use-after-free flaws cannot be exploited since the memory has been wiped. To take things even a step further, the poisoning can be verified at allocation time to make sure that nothing wrote to it between free and allocation (called sanity checking ), which can catch another small subset of flaws. To understand the pieces of this, it s worth describing that the kernel s higher level allocator, the page allocator (e.g. __get_free_pages()) is used by the finer-grained slab allocator (e.g. kmem_cache_alloc(), kmalloc()). Poisoning is handled separately in both allocators. The zero-poisoning happens at the page allocator level. Since the slab allocators tend to do their own allocation/freeing, their poisoning happens separately (since on slab free nothing has been freed up to the page allocator). Only limited performance tuning has been done, so the penalty is rather high at the moment, at about 9% when doing a kernel build workload. Future work will include some exclusion of frequently-freed caches (similar to PAX_MEMORY_SANITIZE), and making the options entirely CONFIG controlled (right now both CONFIGs are needed to build in the code, and a kernel command line is needed to activate it). Performing the sanity checking (mentioned above) adds another roughly 3% penalty. In the general case (and once the performance of the poisoning is improved), the security value of the sanity checking isn t worth the performance trade-off. Tests for the features can be found in lkdtm as READ_AFTER_FREE and READ_BUDDY_AFTER_FREE. If you re feeling especially paranoid and have enabled sanity-checking, WRITE_AFTER_FREE and WRITE_BUDDY_AFTER_FREE can test these as well. To perform zero-poisoning of page allocations and (currently non-zero) poisoning of slab allocations, build with:
CONFIG_DEBUG_PAGEALLOC=n
CONFIG_PAGE_POISONING=y
CONFIG_PAGE_POISONING_NO_SANITY=y
CONFIG_PAGE_POISONING_ZERO=y
CONFIG_SLUB_DEBUG=y
and enable the page allocator poisoning and slab allocator poisoning at boot with this on the kernel command line:
page_poison=on slub_debug=P
To add sanity-checking, change PAGE_POISONING_NO_SANITY=n, and add F to slub_debug as slub_debug=PF . read-only after init I added the infrastructure to support making certain kernel memory read-only after kernel initialization (inspired by a small part of PaX/Grsecurity s KERNEXEC functionality). The goal is to continue to reduce the attack surface within the kernel by making even more of the memory, especially function pointer tables, read-only (which depends on CONFIG_DEBUG_RODATA above). Function pointer tables (and similar structures) are frequently targeted by attackers when redirecting execution. While many are already declared const in the kernel source code, making them read-only (and therefore unavailable to attackers) for their entire lifetime, there is a class of variables that get initialized during kernel (and module) start-up (i.e. written to during functions that are marked __init ) and then never (intentionally) written to again. Some examples are things like the VDSO, vector tables, arch-specific callbacks, etc. As it turns out, most architectures with kernel memory protection already delay making their data read-only until after __init (see mark_rodata_ro()), so it s trivial to declare a new data section ( .data..ro_after_init ) and add it to the existing read-only data section ( .rodata ). Kernel structures can be annotated with the new section (via the __ro_after_init macro), and they ll become read-only once boot has finished. The next step for attack surface reduction infrastructure will be to create a kernel memory region that is passively read-only, but can be made temporarily writable (by a single un-preemptable CPU), for storing sensitive structures that are written to only very rarely. Once this is done, much more of the kernel s attack surface can be made read-only for the majority of its lifetime. As people identify places where __ro_after_init can be used, we can grow the protection. A good place to start is to look through the PaX/Grsecurity patch to find uses of __read_only on variables that are only written to during __init functions. The rest are places that will need the temporarily-writable infrastructure (PaX/Grsecurity uses pax_open_kernel()/pax_close_kernel() for these). That s it for v4.6, next up will be v4.7!

2016, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License

28 September 2016

Kees Cook: security things in Linux v4.5

Previously: v4.4. Some things I found interesting in the Linux kernel v4.5: CONFIG_IO_STRICT_DEVMEM The CONFIG_STRICT_DEVMEM setting that has existed for a long time already protects system RAM from being accessible through the /dev/mem device node to root in user-space. Dan Williams added CONFIG_IO_STRICT_DEVMEM to extend this so that if a kernel driver has reserved a device memory region for use, it will become unavailable to /dev/mem also. The reservation in the kernel was to keep other kernel things from using the memory, so this is just common sense to make sure user-space can t stomp on it either. Everyone should have this enabled. (And if you have a system where you discover you need IO memory access from userspace, you can boot with iomem=relaxed to disable this at runtime.) If you re looking to create a very bright line between user-space having access to device memory, it s worth noting that if a device driver is a module, a malicious root user can just unload the module (freeing the kernel memory reservation), fiddle with the device memory, and then reload the driver module. So either just leave out /dev/mem entirely (not currently possible with upstream), build a monolithic kernel (no modules), or otherwise block (un)loading of modules (/proc/sys/kernel/modules_disabled). ptrace fsuid checking Jann Horn fixed some corner-cases in how ptrace access checks were handled on special files in /proc. For example, prior to this fix, if a setuid process temporarily dropped privileges to perform actions as a regular user, the ptrace checks would not notice the reduced privilege, possibly allowing a regular user to trick a privileged process into disclosing things out of /proc (ASLR offsets, restricted directories, etc) that they normally would be restricted from seeing. ASLR entropy sysctl Daniel Cashman standardized the way architectures declare their maximum user-space ASLR entropy (CONFIG_ARCH_MMAP_RND_BITS_MAX) and then created a sysctl (/proc/sys/vm/mmap_rnd_bits) so that system owners could crank up entropy. For example, the default entropy on 32-bit ARM was 8 bits, but the maximum could be as much as 16. If your 64-bit kernel is built with CONFIG_COMPAT, there s a compat version of the sysctl as well, for controlling the ASLR entropy of 32-bit processes: /proc/sys/vm/mmap_rnd_compat_bits. Here s how to crank your entropy to the max, without regard to what architecture you re on:
for i in "" "compat_"; do f=/proc/sys/vm/mmap_rnd_$ i bits; n=$(cat $f); while echo $n > $f ; do n=$(( n + 1 )); done; done
strict sysctl writes Two years ago I added a sysctl for treating sysctl writes more like regular files (i.e. what s written first is what appears at the start), rather than like a ring-buffer (what s written last is what appears first). At the time it wasn t clear what might break if this was enabled, so a WARN was added to the kernel. Since only one such string showed up in searches over the last two years, the strict writing mode was made the default. The setting remains available as /proc/sys/kernel/sysctl_writes_strict. seccomp UM support Micka l Sala n added seccomp support (and selftests) for user-mode Linux. Moar architectures! seccomp NNP vs TSYNC fix Jann Horn noticed and fixed a problem where if a seccomp filter was already in place on a process (after being installed by a privileged process like systemd, a container launcher, etc) then the setting of the no new privs flag could be bypassed when adding filters with the SECCOMP_FILTER_FLAG_TSYNC flag set. Bypassing NNP meant it might be possible to trick a buggy setuid program into doing things as root after a seccomp filter forced a privilege drop to fail (generally referred to as the sendmail setuid flaw ). With NNP set, a setuid program can t be run in the first place. That s it! Next I ll cover v4.6 Edit: Added notes about iomem=

2016, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License

27 September 2016

Kees Cook: security things in Linux v4.4

Previously: v4.3. Continuing with interesting security things in the Linux kernel, here s v4.4. As before, if you think there s stuff I missed that should get some attention, please let me know. seccomp Checkpoint/Restore-In-Userspace Tycho Andersen added a way to extract and restore seccomp filters from running processes via PTRACE_SECCOMP_GET_FILTER under CONFIG_CHECKPOINT_RESTORE. This is a continuation of his work (that I failed to mention in my prior post) from v4.3, which introduced a way to suspend and resume seccomp filters. As I mentioned at the time (and for which he continues to quote me) this feature gives me the creeps. :) x86 W^X detection Stephen Smalley noticed that there was still a range of kernel memory (just past the end of the kernel code itself) that was incorrectly marked writable and executable, defeating the point of CONFIG_DEBUG_RODATA which seeks to eliminate these kinds of memory ranges. He corrected this in v4.3 and added CONFIG_DEBUG_WX in v4.4 which performs a scan of memory at boot time and yells loudly if unexpected memory protection are found. To nobody s delight, it was shortly discovered the UEFI leaves chunks of memory in this state too, which posed an ugly-to-solve problem (which Matt Fleming addressed in v4.6). x86_64 vsyscall CONFIG I introduced a way to control the mode of the x86_64 vsyscall with a build-time CONFIG selection, though the choice I really care about is CONFIG_LEGACY_VSYSCALL_NONE, to force the vsyscall memory region off by default. The vsyscall memory region was always mapped into process memory at a fixed location, and it originally posed a security risk as a ROP gadget execution target. The vsyscall emulation mode was added to mitigate the problem, but it still left fixed-position static memory content in all processes, which could still pose a security risk. The good news is that glibc since version 2.15 doesn t need vsyscall at all, so it can just be removed entirely. Any kernel built this way that discovered they needed to support a pre-2.15 glibc could still re-enable it at the kernel command line with vsyscall=emulate . That s it for v4.4. Tune in tomorrow for v4.5!

2016, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License

26 September 2016

Kees Cook: security things in Linux v4.3

When I gave my State of the Kernel Self-Protection Project presentation at the 2016 Linux Security Summit, I included some slides covering some quick bullet points on things I found of interest in recent Linux kernel releases. Since there wasn t a lot of time to talk about them all, I figured I d make some short blog posts here about the stuff I was paying attention to, along with links to more information. This certainly isn t everything security-related or generally of interest, but they re the things I thought needed to be pointed out. If there s something security-related you think I should cover from v4.3, please mention it in the comments. I m sure I haven t caught everything. :) A note on timing and context: the momentum for starting the Kernel Self Protection Project got rolling well before it was officially announced on November 5th last year. To that end, I included stuff from v4.3 (which was developed in the months leading up to November) under the umbrella of the project, since the goals of KSPP aren t unique to the project nor must the goals be met by people that are explicitly participating in it. Additionally, not everything I think worth mentioning here technically falls under the kernel self-protection ideal anyway some things are just really interesting userspace-facing features. So, to that end, here are things I found interesting in v4.3: CONFIG_CPU_SW_DOMAIN_PAN Russell King implemented this feature for ARM which provides emulated segregation of user-space memory when running in kernel mode, by using the ARM Domain access control feature. This is similar to a combination of Privileged eXecute Never (PXN, in later ARMv7 CPUs) and Privileged Access Never (PAN, coming in future ARMv8.1 CPUs): the kernel cannot execute user-space memory, and cannot read/write user-space memory unless it was explicitly prepared to do so. This stops a huge set of common kernel exploitation methods, where either a malicious executable payload has been built in user-space memory and the kernel was redirected to run it, or where malicious data structures have been built in user-space memory and the kernel was tricked into dereferencing the memory, ultimately leading to a redirection of execution flow. This raises the bar for attackers since they can no longer trivially build code or structures in user-space where they control the memory layout, locations, etc. Instead, an attacker must find areas in kernel memory that are writable (and in the case of code, executable), where they can discover the location as well. For an attacker, there are vastly fewer places where this is possible in kernel memory as opposed to user-space memory. And as we continue to reduce the attack surface of the kernel, these opportunities will continue to shrink. While hardware support for this kind of segregation exists in s390 (natively separate memory spaces), ARM (PXN and PAN as mentioned above), and very recent x86 (SMEP since Ivy-Bridge, SMAP since Skylake), ARM is the first upstream architecture to provide this emulation for existing hardware. Everyone running ARMv7 CPUs with this kernel feature enabled suddenly gains the protection. Similar emulation protections (PAX_MEMORY_UDEREF) have been available in PaX/Grsecurity for a while, and I m delighted to see a form of this land in upstream finally. To test this kernel protection, the ACCESS_USERSPACE and EXEC_USERSPACE triggers for lkdtm have existed since Linux v3.13, when they were introduced in anticipation of the x86 SMEP and SMAP features. Ambient Capabilities Andy Lutomirski (with Christoph Lameter and Serge Hallyn) implemented a way for processes to pass capabilities across exec() in a sensible manner. Until Ambient Capabilities, any capabilities available to a process would only be passed to a child process if the new executable was correctly marked with filesystem capability bits. This turns out to be a real headache for anyone trying to build an even marginally complex least privilege execution environment. The case that Chrome OS ran into was having a network service daemon responsible for calling out to helper tools that would perform various networking operations. Keeping the daemon not running as root and retaining the needed capabilities in children required conflicting or crazy filesystem capabilities organized across all the binaries in the expected tree of privileged processes. (For example you may need to set filesystem capabilities on bash!) By being able to explicitly pass capabilities at runtime (instead of based on filesystem markings), this becomes much easier. For more details, the commit message is well-written, almost twice as long as than the code changes, and contains a test case. If that isn t enough, there is a self-test available in tools/testing/selftests/capabilities/ too. PowerPC and Tile support for seccomp filter Michael Ellerman added support for seccomp to PowerPC, and Chris Metcalf added support to Tile. As the seccomp maintainer, I get excited when an architecture adds support, so here we are with two. Also included were updates to the seccomp self-tests (in tools/testing/selftests/seccomp), to help make sure everything continues working correctly. That s it for v4.3. If I missed stuff you found interesting, please let me know! I m going to try to get more per-version posts out in time to catch up to v4.8, which appears to be tentatively scheduled for release this coming weekend.

2016, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License

11 November 2015

Kees Cook: evolution of seccomp

I m excited to see other people thinking about userspace-to-kernel attack surface reduction ideas. Theo de Raadt recently published slides describing Pledge. This uses the same ideas that seccomp implements, but with less granularity. While seccomp works at the individual syscall level and in addition to killing processes, it allows for signaling, tracing, and errno spoofing. As de Raadt mentions, Pledge could be implemented with seccomp very easily: libseccomp would just categorize syscalls. I don t really understand the presentation s mention of Optional Security , though. Pledge, like seccomp, is an opt-in feature. Nothing in the kernel refuses to run unpledged programs. I assume his point was that when it gets ubiquitously built into programs (like stack protector), it s effectively not optional (which is alluded to later as comprehensive applicability ~= mandatory mitigation ). Regardless, this sensible (though optional) design gets me back to his slide on seccomp, which seems to have a number of misunderstandings: OpenBSD has some interesting advantages in the syscall filtering department, especially around sockets. Right now, it s hard for Linux syscall filtering to understand why a given socket is being used. Something like SOCK_DNS seems like it could be quite handy. Another nice feature of Pledge is the path whitelist feature. As it s still under development, I hope they expand this to include more things than just paths. Argument inspection is a weak point for seccomp, but under Linux, most of the arguments are ultimately exposed to the LSM layer. Last year I experimented with creating a seccomp LSM for path matching where programs could declare whitelists, similar to standard LSMs. So, yes, Linux could match this API on seccomp . It d just take some extensions to libseccomp to implement pledge(), as I described at the top. With OpenBSD doing a bunch of analysis work on common programs, it d be excellent to see this usable on Linux too. So far on Linux, only a few programs (e.g. Chrome, vsftpd) have bothered to do this using seccomp, and it could be argued that this is ultimately due to how fine grained it is.

2015, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License

6 November 2015

Matthew Garrett: Why improving kernel security is important

The Washington Post published an article today which describes the ongoing tension between the security community and Linux kernel developers. This has been roundly denounced as FUD, with Rob Graham going so far as to claim that nobody ever attacks the kernel.

Unfortunately he's entirely and demonstrably wrong, it's not FUD and the state of security in the kernel is currently far short of where it should be.

An example. Recent versions of Android use SELinux to confine applications. Even if you have full control over an application running on Android, the SELinux rules make it very difficult to do anything especially user-hostile. Hacking Team, the GPL-violating Italian company who sells surveillance software to human rights abusers, found that this impeded their ability to drop their spyware onto targets' devices. So they took advantage of the fact that many Android devices shipped a kernel with a flawed copy_from_user() implementation that allowed them to copy arbitrary userspace data over arbitrary kernel code, thus allowing them to disable SELinux.

If we could trust userspace applications, we wouldn't need SELinux. But we assume that userspace code may be buggy, misconfigured or actively hostile, and we use technologies such as SELinux or AppArmor to restrict its behaviour. There's simply too much userspace code for us to guarantee that it's all correct, so we do our best to prevent it from doing harm anyway.

This is significantly less true in the kernel. The model up until now has largely been "Fix security bugs as we find them", an approach that fails on two levels:

1) Once we find them and fix them, there's still a window between the fixed version being available and it actually being deployed
2) The forces of good may not be the first ones to find them

This reactive approach is fine for a world where it's possible to push out software updates without having to perform extensive testing first, a world where the only people hunting for interesting kernel vulnerabilities are nice people. This isn't that world, and this approach isn't fine.

Just as features like SELinux allow us to reduce the harm that can occur if a new userspace vulnerability is found, we can add features to the kernel that make it more difficult (or impossible) for attackers to turn a kernel bug into an exploitable vulnerability. The number of people using Linux systems is increasing every day, and many of these users depend on the security of these systems in critical ways. It's vital that we do what we can to avoid their trust being misplaced.

Many useful mitigation features already exist in the Grsecurity patchset, but a combination of technical disagreements around certain features, personality conflicts and an apparent lack of enthusiasm on the side of upstream kernel developers has resulted in almost none of it landing in the kernels that most people use. Kees Cook has proposed a new project to start making a more concerted effort to migrate components of Grsecurity to upstream. If you rely on the kernel being a secure component, either because you ship a product based on it or because you use it yourself, you should probably be doing what you can to support this.

Microsoft received entirely justifiable criticism for the terrible state of security on their platform. They responded by introducing cutting-edge security features across the OS, including the kernel. Accusing anyone who says we need to do the same of spreading FUD is risking free software being sidelined in favour of proprietary software providing more real-world security. That doesn't seem like a good outcome.

comment count unavailable comments

27 July 2015

Kees Cook: 3D printing Poe

I helped print this statue of Edgar Allan Poe, through We the Builders , who coordinate large-scale crowd-sourced 3D print jobs: Poe's Face You can see one of my parts here on top, with -Kees on the piece with the funky hair strand: Poe's Hair The MakerWare I run on Ubuntu works well. I wish they were correctly signing their repositories. Even if I use non-SSL to fetch their key, as their Ubuntu/Debian instructions recommend, it still doesn t match the packages:
W: GPG error: http://downloads.makerbot.com trusty Release: The following signatures were invalid: BADSIG 3D019B838FB1487F MakerBot Industries dev team <dev@makerbot.com>
And it s not just my APT configuration:
$ wget http://downloads.makerbot.com/makerware/ubuntu/dists/trusty/Release.gpg
$ wget http://downloads.makerbot.com/makerware/ubuntu/dists/trusty/Release
$ gpg --verify Release.gpg Release
gpg: Signature made Wed 11 Mar 2015 12:43:07 PM PDT using RSA key ID 8FB1487F
gpg: requesting key 8FB1487F from hkp server pgp.mit.edu
gpg: key 8FB1487F: public key "MakerBot Industries LLC (Software development team) <dev@makerbot.com>" imported
gpg: Total number processed: 1
gpg:               imported: 1  (RSA: 1)
gpg: BAD signature from "MakerBot Industries LLC (Software development team) <dev@makerbot.com>"
$ grep ^Date Release
Date: Tue, 09 Jun 2015 19:41:02 UTC
Looks like they re updating their Release file without updating the signature file. (The signature is from March, but the Release file is from June. Oops!)

2015, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License

31 October 2014

Russell Coker: Links October 2014

The Verge has an interesting article about Tim Cook (Apple CEO) coming out [1]. Tim says if hearing that the CEO of Apple is gay can help someone struggling to come to terms with who he or she is, or bring comfort to anyone who feels alone, or inspire people to insist on their equality, then it s worth the trade-off with my own privacy . Graydon2 wrote an insightful article about the right-wing libertarian sock-puppets of silicon valley [2]. George Monbiot wrote an insightful article for The Guardian about the way that double-speak facilitates killing people [3]. He is correct that the media should hold government accountable for such use of language instead of perpetuating it. Anne Th riault wrote an insightful article for Vice about the presumption of innocence and sex crimes [4]. Dr Nerdlove wrote an interesting article about Gamergate as the extinction burst of gamer culture [5], we can only hope. Shweta Narayan wrote an insightful article about Category Structure and Oppression [6]. I can t summarise it because it s a complex concept, read the article. Some Debian users who don t like Systemd have started a Debian Fork project [7], which so far just has a web site and nothing else. I expect that they will never write any code. But it would be good if they did, they would learn about how an OS works and maybe they wouldn t disagree so much with the people who have experience in developing system software. A GamerGate terrorist in Utah forces Anita Sarkeesian to cancel a lecture [8]. I expect that the reaction will be different when (not if) an Islamic group tries to get a lecture cancelled in a similar manner. Model View Culture has an insightful article by Erika Lynn Abigail about Autistics in Silicon Valley [9]. Katie McDonough wrote an interesting article for Salon about Ed Champion and what to do about men who abuse women [10]. It s worth reading that while thinking about the FOSS community

25 October 2014

Miriam Ruiz: Video game players and Gamers are different things

Even though the Wikipedia defines gamer as someone who partakes in interactive gaming, such as (predominantly) video games or board games , this doesn t really gets close to that term means socially at the moment. Going back to Wikipedia, we find that the video game subculture is a form of new media subculture that has been influenced by video games , so it might be quite accurate to define gamers as members of that subculture. You will find that most of the uses of the term gamer in the social networks and in the blogosphere refer to that. Please notice that, even though it is quite likely that most of the gamers play video games, the other way round does not need to be true and, in fact, it isn t. Not everyone who plays video games belongs to the video game subculture, shares their point of view, their values and aesthetics, or even know about it. Kind of like what happens with the word hacker . Not everyone who hacks around with a computer belongs to the hacker subculture. Mostly everyone who has access to the technology plays video games now. From babies and kids to grandparents. And people play them in every possible technological system around, not only on video game consoles or personal computers, but alse on mobile phones, tablets, web browsers. And many of those people who use different kind of technologies to play video games are not gamers. Not in the sense of belonging to the video game subculture. It is important to acknowledge that: that the video game subculture does not have the monopoly over video games or the video game developing industry anymore. As you can imagine, all this rand doesn t come from nowhere. During the last months, we have been witnessing a fight between some conservative core members of the video game subculture and people who want to bring some fresh air into the sociocultural elements of that subculture. Namely, that women shouldn t be discriminated inside it. As every time that a women raises her voice to complain about anything in the Internet, they have been subjected to insults, attacks, rape and death threats, etc. I m talking about something called #GamerGate, and even though I m not going to get into it, I will provide some URLs in case someone might be interested. Please acknowledge that not all the points of view might be represented in this list (in fact, they are not, as I won t be promoting in my blog things that I severely disagree with), so search the web for more information if you want to get that. I ve never been a gamer myself, meaning part of the subculture I mentioned. At some point I was probably closer tho the core values they had then than I am now. In any case, video games have already consolidated themselves as an important part of current culture, entertainment, education and socialization, and are definitely here to stay. That will probably mean that the percentage of gamers (members of the video game subculture) will become smaller. as the number of non-gamer video game players keeps raising.

30 September 2014

Russell Coker: Links September 2014

Matt Palmer wrote a short but informative post about enabling DNS in a zone [1]. I really should setup DNSSEC on my own zones. Paul Wayper has some insightful comments about the Liberal party s nasty policies towards the unemployed [2]. We really need a Basic Income in Australia. Joseph Heath wrote an interesting and insightful article about the decline of the democratic process [3]. While most of his points are really good I m dubious of his claims about twitter. When used skillfully twitter can provide short insights into topics and teasers for linked articles. Sarah O wrote an insightful article about NotAllMen/YesAllWomen [4]. I can t summarise it well in a paragraph, I recommend reading it all. Betsy Haibel wrote an informative article about harassment by proxy on the Internet [5]. Everyone should learn about this before getting involved in discussions about controversial issues. George Monbiot wrote an insightful and interesting article about the referendum for Scottish independence and the failures of the media [6]. Mychal Denzel Smith wrote an insightful article How to know that you hate women [7]. Sam Byford wrote an informative article about Google s plans to develop and promote cheap Android phones for developing countries [8]. That s a good investment in future market share by Google and good for the spread of knowledge among people all around the world. I hope that this research also leads to cheap and reliable Android devices for poor people in first-world countries. Deb Chachra wrote an insightful and disturbing article about the culture of non-consent in the IT industry [9]. This is something we need to fix. David Hill wrote an interesting and informative article about the way that computer game journalism works and how it relates to GamerGate [10]. Anita Sarkeesian shares the most radical thing that you can do to support women online [11]. Wow, the world sucks more badly than I realised. Michael Daly wrote an article about the latest evil from the NRA [12]. The NRA continues to demonstrate that claims about good people with guns are lies, the NRA are evil people with guns.

8 September 2014

Jaldhar Vyas: Debconf 14 - Days 1 and 2

Unfortunately I was not able to attend debconf this year but thanks to the awesome video team the all the talks are available for your viewing pleasure. In order to recreate an authentic Portland experience, I took my laptop into the shower along with a vegan donut and had my children stand outside yelling excerpts from salon.com in whiny Canadianesque accents. Here are some notes I took as I watched the talks. Welcome Talk
Debian in the Dark Ages of Free software - Stefan Zacchiroli Weapons of the Geek - Gabriella Coleman bugs.debian.org -- Database Ho! - Don Armstrong Grub Ancient and Modern - Colin and Watson One year of fedmsg in Debian - Nicolas Dandrimont Coming of Age: My Life with Debian - Christine Spang Status report of the Debian Printing Team - Didier Raboud

31 July 2014

Craig Small: Linux Capabilities

I was recently updating some code that uses fping. Initially it used exec() that was redirected to a temporary file but I changed it to use popen. While it had been a while since I ve done this sort of thing, I do recall there was an issue with running popen on setuid binary. A later found it is mainly around setuid scripts which are very problematic and there are good reasons why you don t do this. Anyhow, the program worked fine which surprised me. Was fping setuid root to get the raw socket?
$ ls -l /usr/bin/fping
-rwxr-xr-x 1 root root 31464 May  6 21:42 /usr/bin/fping
It wasn t which at first all I thought ok, so that s why popen is happy . The way that fping and other programs work is they bind to a raw socket. This socket sits below the normal type sockets such as the ones used for TCP and UDP and normal users cannot use them by default. So how did fping work it s magic and get access to this socket? It used Capabilities. Previously getting privileged features had a big problem; it was an all or nothing thing. You want access to a raw socket? Sure, be setuid but that means you also could, for example, read any file on the system or set passwords. Capabilites provide a way of giving programs some better level of access, but not a blank cheque. The tool getcap is the way of determining what capabilities are found on a file. These capabilities are attributes on the file which, when the file is run, turn into capabilities or extra permissions. fping has the capability cap_net_raw+ep applied to it. This gives access to the RAW and PACKET sockets which is what fping needs. The +ep after the capability name means it is an Effective and Permitted capability, which describes what happens with child processes and dropping privileges. I hadn t seen these Capabilities before. They are a nice way to give your programs the access they need, but limiting the risk of something going wrong and having a rouge program running as root.

13 June 2014

Kees Cook: glibc select weakness fixed

In 2009, I reported this bug to glibc, describing the problem that exists when a program is using select, and has its open file descriptor resource limit raised above 1024 (FD_SETSIZE). If a network daemon starts using the FD_SET/FD_CLR glibc macros on fdset variables for descriptors larger than 1024, glibc will happily write beyond the end of the fdset variable, producing a buffer overflow condition. (This problem had existed since the introduction of the macros, so, for decades? I figured it was long over-due to have a report opened about it.) At the time, I was told this wasn t going to be fixed and every program using [select] must be considered buggy. 2 years later still more people kept asking for this feature and continued to be told no . But, as it turns out, a few months later after the most recent no , it got silently fixed anyway, with the bug left open as Won t Fix ! I m glad Florian did some house-cleaning on the glibc bug tracker, since I d otherwise never have noticed that this protection had been added to the ever-growing list of -D_FORTIFY_SOURCE=2 protections. I ll still recommend everyone use poll instead of select, but now I won t be so worried when I see requests to raise the open descriptor limit above 1024.

2014, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License

7 May 2014

Kees Cook: Linux Security Summit 2014

The Linux Security Summit is happening in Chicago August 18th and 19th, just before LinuxCon. Send us some presentation and topic proposals, and join the conversation with other like-minded people. :) I d love to see what people have been working on, and what they d like to work on. Our general topics will hopefully include: The Call For Participation closes June 6th, so you ve got about a month, but earlier is better.

2014, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License

3 February 2014

Kees Cook: compiler hardening in Ubuntu and Debian

Back in 2006, the compiler in Ubuntu was patched to enable most build-time security-hardening features (relro, stack protector, fortify source). I wasn t able to convince Debian to do the same, so Debian went the route of other distributions, adding security hardening flags during package builds only. I remain disappointed in this approach, because it means that someone who builds software without using the packaging tools on a non-Ubuntu system won t get those hardening features. Think of a sysadmin trying the latest nginx, or a vendor like Valve building games for distribution. On Ubuntu, when you do that ./configure && make you ll get the features automatically. Debian, at the time, didn t have a good way forward even for package builds since it lacked a concept of global package build flags . Happily, a solution (via dh) was developed about 2 years ago, and Debian package maintainers have been working to adopt it ever since. So, while I don t think any distro can match Ubuntu s method of security hardening compiler defaults, it is valuable to see the results of global package build flags in Debian on the package archive. I ve had an on-going graph of the state of build hardening on both Ubuntu and Debian for a while, but only recently did I put together a comparison of a default install. Very few people have all the packages in the archive installed, so it s a bit silly to only look at the archive statistics. But let s start there, just to describe what s being measured. Here s today s snapshot of Ubuntu s development archive for the past year (you can see development opening after a release every 6 months with an influx of new packages):
Here s today s snapshot of Debian s unstable archive for the past year (at the start of May you can see the archive unfreezing after the Wheezy release; the gaps were my analysis tool failing):
Ubuntu s lines are relatively flat because everything that can be built with hardening already is. Debian s graph is on a slow upward trend as more packages get migrated to dh to gain knowledge of the global flags. Each line in the graphs represents the count of source packages that contain binary packages that have at least 1 hit for a given category. ELF is just that: a source package that ultimately produces at least 1 binary package with at least 1 ELF binary in it (i.e. produces a compiled output). The Read-only Relocations ( relro ) hardening feature is almost always done for an ELF, excepting uncommon situations. As a result, the count of ELF and relro are close on Ubuntu. In fact, examining relro is a good indication of whether or not a source package got built with hardening of any kind. So, in Ubuntu, 91.5% of the archive is built with hardening, with Debian at 55.2%. The stack protector and fortify source features depend on characteristics of the source itself, and may not always be present in package s binaries even when hardening is enabled for the build (e.g. no functions got selected for stack protection, or no fortified glibc functions were used). Really these lines mostly indicate the count of packages that have a sufficiently high level of complexity that would trigger such protections. The PIE and immediate binding ( bind_now ) features are specifically enabled by a package maintainer. PIE can have a noticeable performance impact on CPU-register-starved architectures like i386 (ia32), so it is neither patched on in Ubuntu, nor part of the default flags in Debian. (And bind_now doesn t make much sense without PIE, so they usually go together.) It s worth noting, however, that it probably should be the default on amd64 (x86_64), which has plenty of available registers. Here is a comparison of default installed packages between the most recent stable releases of Ubuntu (13.10) and Debian (Wheezy). It s clear that what the average user gets with a default fresh install is better than what the archive-to-archive comparison shows. Debian s showing is better (74% built with hardening), though it is still clearly lagging behind Ubuntu (99%):

2014, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License.
Creative Commons License

Next.

Previous.